Experiments in speaker normalisation and adaptation for large vocabulary speech recognition
نویسندگان
چکیده
This paper examines techniques for speaker normalisation and adaptation that are applied in training with the aim of removing some of the variability from the speaker independent models. Two techniques are examined: vocal tract normalisation (VTN) which estimates a single \vocal tract length" parameter for each speaker and then modi es the speech parameterisation accordingly and speaker adaptive training (SAT) which estimates Gaussian mean and variance parameters jointly with a speaker speci c set of maximum likelihood linear regression (MLLR) based transformations. It is shown that VTN is e ective for both clean speech and mismatched conditions and that the further improvements obtained by applying MLLR in testing are essentially additive. Detailed results from the use of SAT show that worthwhile improvements over using MLLR with standard speaker independent models are obtained.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملPitch adaptive features for LVCSR
We have investigated the use of a pitch adaptive spectral representation on large vocabulary speech recognition, in conjunction with speaker normalisation techniques. We have compared the effect of a smoothed spectrogram to the pitch adaptive spectral analysis by decoupling these two components of STRAIGHT. Experiments performed on a large vocabulary meeting speech recognition task highlight th...
متن کاملThe 1998 HTK system for transcription of conversational telephone speech
This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 Hub5E evaluation. Front-end and language modelling experiments conducted using various training and test sets from both the Switchboard and Callhome English corpora are presented. Our complete system includes reduced bandwidth analysis, sidebased cepstral fea...
متن کاملTransform ation and Com bination of Hidden M arkov M odels for Speaker Selection Training
This paper presents a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for test speaker using Gaussian mixture model, which is more reliable given very limited adaptation data. Then cohort models are linearly transformed closer to each test speaker. Finally the adapted model for the test speaker is obtained by combining these transf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997